SwePub
Tyck till om SwePub Sök här!
Sök i SwePub databas

  Utökad sökning

Träfflista för sökning "db:Swepub ;pers:(Lu Zhonghai);pers:(Chen Qinyu)"

Sökning: db:Swepub > Lu Zhonghai > Chen Qinyu

  • Resultat 1-5 av 5
Sortera/gruppera träfflistan
   
NumreringReferensOmslagsbildHitta
1.
  • Chen, Qinyu, et al. (författare)
  • An Efficient Accelerator for Multiple Convolutions From the Sparsity Perspective
  • 2020
  • Ingår i: IEEE Transactions on Very Large Scale Integration (vlsi) Systems. - : Institute of Electrical and Electronics Engineers (IEEE). - 1063-8210 .- 1557-9999. ; 28:6, s. 1540-1544
  • Tidskriftsartikel (refereegranskat)abstract
    • Convolutional neural networks (CNNs) have emerged as one of the most popular ways applied in many fields. These networks deliver better performance when going deeper and larger. However, the complicated computation and huge storage impede hardware implementation. To address the problem, quantized networks are proposed. Besides, various convolutional structures are designed to meet the requirements of different applications. For example, compared with the traditional convolutions (CONVs) for image classification, CONVs for image generation are usually composed of traditional CONVs, dilated CONVs, and transposed CONVs, leading to a difficult hardware mapping problem. In this brief, we translate the difficult mapping problem into the sparsity problem and propose an efficient hardware architecture for sparse binary and ternary CNNs by exploiting the sparsity and low bit-width characteristics. To this end, we propose an ineffectual data removing (IDR) mechanism to remove both the regular and irregular sparsity based on dual-channel processing elements (PEs). Besides, a flexible layered load balance (LLB) mechanism is introduced to alleviate the load imbalance. The accelerator is implemented with 65-nm technology with a core size of 2.56 mm(2). It can achieve 3.72-TOPS/W energy efficiency at 50.1 mW, which makes it a promising design for embedded devices.
  •  
2.
  • Chen, Qinyu, et al. (författare)
  • An Efficient Streaming Accelerator for Low Bit-Width Convolutional Neural Networks
  • 2019
  • Ingår i: Electronics. - : MDPI. - 2079-9292. ; 8:4
  • Tidskriftsartikel (refereegranskat)abstract
    • Convolutional Neural Networks (CNNs) have been widely applied in various fields, such as image recognition, speech processing, as well as in many big-data analysis tasks. However, their large size and intensive computation hinder their deployment in hardware, especially on the embedded systems with stringent latency, power, and area requirements. To address this issue, low bit-width CNNs are proposed as a highly competitive candidate. In this paper, we propose an efficient, scalable accelerator for low bit-width CNNs based on a parallel streaming architecture. With a novel coarse grain task partitioning (CGTP) strategy, the proposed accelerator with heterogeneous computing units, supporting multi-pattern dataflows, can nearly double the throughput for various CNN models on average. Besides, a hardware-friendly algorithm is proposed to simplify the activation and quantification process, which can reduce the power dissipation and area overhead. Based on the optimized algorithm, an efficient reconfigurable three-stage activation-quantification-pooling (AQP) unit with the low power staged blocking strategy is developed, which can process activation, quantification, and max-pooling operations simultaneously. Moreover, an interleaving memory scheduling scheme is proposed to well support the streaming architecture. The accelerator is implemented with TSMC 40 nm technology with a core size of . It can achieve TOPS/W energy efficiency and area efficiency at 100.1mW, which makes it a promising design for the embedded devices.
  •  
3.
  • Chen, Qinyu, et al. (författare)
  • Enabling Energy-Efficient Inference for Self-Attention Mechanisms in Neural Networks
  • 2022
  • Ingår i: 2022 Ieee International Conference On Artificial Intelligence Circuits And Systems (Aicas 2022). - : Institute of Electrical and Electronics Engineers (IEEE). ; , s. 25-28
  • Konferensbidrag (refereegranskat)abstract
    • The study of specialized accelerators tailored for neural networks is becoming a promising topic in recent years. Such existing neural network accelerators are usually designed for convolutional neural networks (CNNs) or recurrent neural networks have been (RNNs), however, less attention has been paid to the attention mechanisms, which is an emerging neural network primitive with the ability to identify the relations within input entities. The self-attention-oriented models such as Transformer have achieved great performance on natural language processing, computer vision and machine translation. However, the self-attention mechanism has intrinsically expensive computational workloads, which increase quadratically with the number of input entities. Therefore, in this work, we propose an software-hardware co-design solution for energy-efficient self-attention inference. A prediction-based approximate self-attention mechanism is introduced to substantially reduce the runtime as well as power consumption, and then a specialized hardware architecture is designed to further increase the speedup. The design is implemented on a Xilinx XC7Z035 FPGA, and the results show that the energy efficiency is improved by 5.7x with less than 1% accuracy loss.
  •  
4.
  • Chen, Qinyu, et al. (författare)
  • Smilodon : An Efficient Accelerator for Low Bit-Width CNNs with Task Partitioning
  • 2019
  • Ingår i: 2019 IEEE INTERNATIONAL SYMPOSIUM ON CIRCUITS AND SYSTEMS (ISCAS). - : IEEE. - 9781728103976
  • Konferensbidrag (refereegranskat)abstract
    • Convolutional Neural Networks (CNNs) have been widely applied in various fields such as image and video recognition, recommender systems, and natural language processing. However, the massive size and intensive computation loads prevent its feasible deployment in practice, especially on the embedded systems. As a highly competitive candidate, low bit-width CNNs are proposed to enable efficient implementation. In this paper, we propose Smilodon, a scalable, efficient accelerator for low bit-width CNNs based on a parallel streaming architecture, optimized with a task partitioning strategy. We also present the 3D systolic-like computing arrays fitting for convolutional layers. Our design is implemented on Zynq XC7ZO20 FPGA, which can satisfy the needs of real-time with a frame rate of 1, 622 FPS throughput, while consuming 2.1 Watt. To the best of our knowledge, our accelerator is superior to the state-of-the-art works in the tradeoff among throughput, power efficiency, and area efficiency.
  •  
5.
  • Fu, Yuxiang, et al. (författare)
  • Congestion-Aware Dynamic Elevator Assignment for Partially Connected 3D-NoCs
  • 2019
  • Ingår i: 2019 IEEE INTERNATIONAL SYMPOSIUM ON CIRCUITS AND SYSTEMS (ISCAS). - : IEEE. - 9781728103976
  • Konferensbidrag (refereegranskat)abstract
    • The combination of Network-on-Chips (NoCs) and 3D IC technology, 3D NoCs, has been proven to be able to achieve a great improvement in both network performance and power consumption compared to 2D NoCs. In the traditional 3D NoC, all routers are vertically connected. Due to the large overhead of Through-Silicon-Via (TSV, e.g., low fabrication yield and the occupied silicon area), the partially connected 3D NoC has emerged. The assignment method determines the traffic loads of the vertical links (elevators), thus has a great impact on 3D-NoCs' performance. In this paper, we propose a congestion-aware dynamic elevator assignment (CDA) scheme, which takes both the distance factors and network congestion information into account. Experiments show that the performance of the proposed CDA scheme is improved by 67% to 87% compared to the random selection scheme, 8% to 25% compared to SelByDis-1, and 13% to 18% compared to SelByDis-2.
  •  
Skapa referenser, mejla, bekava och länka
  • Resultat 1-5 av 5
Typ av publikation
konferensbidrag (3)
tidskriftsartikel (2)
Typ av innehåll
refereegranskat (5)
Författare/redaktör
Li, Li (4)
Fu, Yuxiang (4)
Song, Wenqing (3)
Zhang, Chuan (3)
visa fler...
Cheng, Kaifeng (2)
Huang, Yan (1)
Sun, Rui (1)
Chen, Kai (1)
Sun, Congyi (1)
Gao, Chang (1)
He, Guoqiang (1)
visa färre...
Lärosäte
Kungliga Tekniska Högskolan (5)
Språk
Engelska (5)
Forskningsämne (UKÄ/SCB)
Teknik (4)
Naturvetenskap (1)

År

Kungliga biblioteket hanterar dina personuppgifter i enlighet med EU:s dataskyddsförordning (2018), GDPR. Läs mer om hur det funkar här.
Så här hanterar KB dina uppgifter vid användning av denna tjänst.

 
pil uppåt Stäng

Kopiera och spara länken för att återkomma till aktuell vy